model: (qwen3next) correct vectorized key_gdiff calculation by ngxson · Pull Request #19324 · ggml-org/llama.cpp

ngxson · 2026-02-04T11:03:03Z

Testing with the provided prompt from #19305

ngxson · 2026-02-04T11:14:20Z

Quite fun: after applied 4bfbf0b, I asked the model to identify the bug (giving it the code before that commit). It does successfully identified the problem and even suggested me to make one more improvement (commit d871ac8)

Mushoz · 2026-02-04T11:16:10Z

We've officially arrived at self-improving AI it looks like ;)

ggerganov · 2026-02-04T11:32:07Z

My test cases that were failing before are now passing with this change.

ngxson · 2026-02-04T12:08:25Z

I updated to compare-logprobs script can reran it. There are still some diversions from vLLM (I suppose due to numerical issues), but it does look better on long context (see tokens past 5000 depth):

PR

idx	logits_llama.log	logprob_1	logits_vllm.log	logprob_2	diff (abs)
1	' '	-3.0408	' '	-3.0440	0.0033
2	'\n\n'	-0.6087	'\n\n'	-0.5918	0.0170
3	' API'	-0.7177	' API'	-0.8431	0.1254
4	' lightweight'	-0.2557	' lightweight'	-0.2838	0.0281
5	' and'	-0.1517	' and'	-0.1594	0.0077
6	' C'	-0.0635	' C'	-0.0332	0.0302
7	' HTTP'	-0.0113	' HTTP'	-0.0080	0.0032
8	' server'	-0.0037	' server'	-0.0066	0.0029
9	' based'	-0.0240	' based'	-0.0691	0.0451
10	' on'	-0.0000	' on'	-0.0000	0.0000
1011	' GPU'	-1.0844	' GPU'	-1.1533	0.0689
1012	' parameters'	-0.0969	' parameters'	-0.1143	0.0175
1013	' to'	-0.2201	' to'	-0.1712	0.0488
1014	' fit'	-0.0660	' fit'	-0.0926	0.0266
1015	' model'	-0.1862	' model'	-0.3159	0.1297
1016	' available'	-0.3698	' available'	-0.5154	0.1456
1017	' memory'	-0.0490	' memory'	-0.0509	0.0019
1018	' ('	-0.0223	' ('	-0.0401	0.0178
1019	' or'	-0.1865	' or'	-0.3119	0.1253
1020	' ''	-0.0016	' ''	-0.0011	0.0005
5021	' tokens'	-0.0002	' tokens'	-0.0001	0.0001
5022	' at'	-0.0000	' at'	-0.0000	0.0000
5023	' a'	-0.6503	' minimum'	-0.6290	0.0213
5024	' Default'	-0.0021	' Default'	-0.0005	0.0015
5025	' `'	-0.0000	' `'	-0.0000	0.0000
5026	' Set'	-0.7499	' Time'	-0.6461	0.1037
5027	' a'	-0.1898	' a'	-0.2087	0.0189
5028	' time'	-0.0009	' time'	-0.0005	0.0003
5029	' limit'	-0.0007	' limit'	-0.0009	0.0002
5030	' for'	-0.6250	' ('	-0.7296	0.1046

master

idx	logits_llama.log	logprob_1	logits_vllm.log	logprob_2	diff (abs)
1	' '	-3.0408	' '	-3.0440	0.0033
2	'\n\n'	-0.6088	'\n\n'	-0.5918	0.0170
3	' API'	-0.8385	' API'	-0.8431	0.0046
4	' lightweight'	-0.2408	' lightweight'	-0.2838	0.0430
5	' pure'	-0.6919	' and'	-0.1594	0.5325
6	' C'	-0.0190	' C'	-0.0332	0.0142
7	' HTTP'	-0.0373	' HTTP'	-0.0080	0.0293
8	' server'	-0.0021	' server'	-0.0066	0.0044
9	' based'	-0.6722	' based'	-0.0691	0.6031
10	' on'	-0.0001	' on'	-0.0000	0.0000
1011	' GPU'	-1.3448	' GPU'	-1.1533	0.1915
1012	' GPU'	-0.6565	' parameters'	-0.1143	0.5422
1013	' to'	-0.7436	' to'	-0.1712	0.5724
1014	' fit'	-0.1738	' fit'	-0.0926	0.0812
1015	' model'	-0.6253	' model'	-0.3159	0.3094
1016	' available'	-0.9195	' available'	-0.5154	0.4041
1017	' memory'	-0.0584	' memory'	-0.0509	0.0075
1018	' ('	-0.0105	' ('	-0.0401	0.0296
1019	'/''	-0.9424	' or'	-0.3119	0.6305
1020	' ''	-0.0020	' ''	-0.0011	0.0009
5021	' tokens'	-0.0002	' tokens'	-0.0001	0.0001
5022	' at'	-0.0004	' at'	-0.0000	0.0004
5023	' minimum'	-0.2551	' minimum'	-0.6290	0.3738
5024	' Default'	-0.0005	' Default'	-0.0005	0.0000
5025	' `'	-0.0000	' `'	-0.0000	0.0000
5026	' Set'	-0.0209	' Time'	-0.6461	0.6252
5027	' a'	-0.3147	' a'	-0.2087	0.1061
5028	' time'	-0.0003	' time'	-0.0005	0.0002
5029	' limit'	-0.0062	' limit'	-0.0009	0.0053
5030	' in'	-0.5323	' ('	-0.7296	0.1973

Mushoz · 2026-02-04T12:12:16Z

It does still deviate in the token that was picked at position 5030. Shouldn't numerical precision issues still result in the same token?

ngxson · 2026-02-04T12:24:47Z

Not always, numerical differences can accumulate enough to change the output logits. But I think it may depend on the quantization that I'm using (q8_0). Will need to do more testing. But for now, I think the current fix should already be good enough.

ggerganov · 2026-02-04T12:40:02Z

Btw, it's quite funny watching the ollama bros copy-pasting our bugs into their "new engine" 🤣. Let's see how long it will take them to realize.

https://github.com/ollama/ollama/pull/14051/changes#diff-1b8f23e564159d80674c3a97ca9f02489ad6ee90bf956f4ed92062811a6be0e5R447-R453

CISC · 2026-02-04T12:43:23Z

Btw, it's quite funny watching the ollama bros copy-pasting our bugs into their "new engine" 🤣. Let's see how long it will take them to realize.

That's the only reason we write buggy code, right? * cough *

CISC · 2026-02-04T14:40:25Z

@ngxson https://github.com/ggml-org/llama.cpp/actions/runs/21670903144/job/62478218069

ngxson · 2026-02-04T15:50:39Z

@ngxson https://github.com/ggml-org/llama.cpp/actions/runs/21670903144/job/62478218069

hmm ok I ran locally editorconfig but didn't catch it earlier, probably I was on a wrong branch. pushing a fix along with #19331

fizzAI · 2026-02-04T17:04:52Z

Btw, it's quite funny watching the ollama bros copy-pasting our bugs into their "new engine" 🤣. Let's see how long it will take them to realize.

https://github.com/ollama/ollama/pull/14051/changes#diff-1b8f23e564159d80674c3a97ca9f02489ad6ee90bf956f4ed92062811a6be0e5R447-R453

I 100% bet you they're just vibe-translating LCPP PRs to Go. lollllll

Mushoz · 2026-02-04T22:22:19Z

Do GGUFs need to be regenerated after this change? I was under the impression that wouldn't be needed, but this message by the Unsloth team tells me that I do: https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/discussions/5

CISC · 2026-02-04T22:28:36Z

Do GGUFs need to be regenerated after this change? I was under the impression that wouldn't be needed, but this message by the Unsloth team tells me that I do: https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/discussions/5

No there are no conversion changes, no idea why they reconverted the model.

ngxson · 2026-02-04T22:42:43Z

It should only affect I-quants, since imatrix is generated from intermediate activations.

Normal quants (Qx_0, Qx_1, Qx_K) should not be affected

CISC · 2026-02-04T22:48:02Z

Ah, yes, imatrix would be affected.

danielhanchen · 2026-02-05T02:46:43Z

Oh yes Q8_K_XL, Q8_0, BF16, MXFP4_MOE are fine - the rest are imatrix so they did change a bit

…#19324) * model: (qwen3next) correct vectorized key_gdiff calculation * move transpose to outside of loop

model: (qwen3next) correct vectorized key_gdiff calculation

4bfbf0b

ngxson requested a review from CISC as a code owner February 4, 2026 11:03

ngxson mentioned this pull request Feb 4, 2026

Eval bug: Qwen3-Coder-Next Poor Outputs #19305

Closed

CISC approved these changes Feb 4, 2026

View reviewed changes

move transpose to outside of loop

d871ac8

ggerganov mentioned this pull request Feb 4, 2026

qwen3next : fix chunking #19321

Closed

github-actions bot added the model Model specific label Feb 4, 2026

ngxson merged commit 8abcc70 into ggml-org:master Feb 4, 2026
66 of 75 checks passed

inforithmics mentioned this pull request Feb 4, 2026

Qwen3 next had an important Bugfix in llama.cpp ollama/ollama#14070

Closed

inforithmics mentioned this pull request Feb 5, 2026

glm-4.7-flash is slow and uses a lot of cpu ollama/ollama#14045

Open

liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026

model: (qwen3next) correct vectorized key_gdiff calculation (ggml-org…

0bc48e7

…#19324) * model: (qwen3next) correct vectorized key_gdiff calculation * move transpose to outside of loop

Conversation

ngxson commented Feb 4, 2026

Uh oh!

ngxson commented Feb 4, 2026

Uh oh!

Mushoz commented Feb 4, 2026

Uh oh!

ggerganov commented Feb 4, 2026

Uh oh!

ngxson commented Feb 4, 2026

PR

master

Uh oh!

Uh oh!

Mushoz commented Feb 4, 2026

Uh oh!

ngxson commented Feb 4, 2026

Uh oh!

ggerganov commented Feb 4, 2026

Uh oh!

CISC commented Feb 4, 2026

Uh oh!

CISC commented Feb 4, 2026

Uh oh!

ngxson commented Feb 4, 2026

Uh oh!

fizzAI commented Feb 4, 2026

Uh oh!

Mushoz commented Feb 4, 2026

Uh oh!

CISC commented Feb 4, 2026

Uh oh!

ngxson commented Feb 4, 2026

Uh oh!

CISC commented Feb 4, 2026

Uh oh!

danielhanchen commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants